A Binary-Categorization Approach for Classifying Multiple-Record Web Documents Using Application Ontologies and a Probabilistic Model

نویسندگان

  • Yiu-Kai Ng
  • June Tang
  • Michael A. Goodrich
چکیده

The amount of information available on the World Wide Web has been increasing dramatically in recent years. To enhance speedy searching and retrieving Web documents of interest, researchers and practitioners have partially relied on various information retrieval techniques. In this paper; we propose a probabilistic model to classib Web documents into relevant documents and irrelevant documents with respect to a particular application ontology, which is a conceptual-model snippet of standard ontologies. Our probabilistic model is based on multivariate statistical analysis and is different from the conventional probabilistic information retrieval models. The experiments we have conducted on a set of representative Web documents indicate that the proposed probabilistic model is promising in binary-categorization of multiple-record Web documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilabel Classification of Documents with Mapreduce

Multilabel classification is the problem of assigning a set of positive labels to an instance and recently it is highly required in applications like protein function classification, music categorization, gene classification and document classification for easy identification and retrieving of information. Labeling the documents of the web manually is a time consuming and a difficult task due t...

متن کامل

Parametric Mixture Models for Multi-Labeled Text

We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of te...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

Classification of Web Documents Using Concept Extraction from Ontologies

In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed onto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001